home
***
CD-ROM
|
disk
|
FTP
|
other
***
search
/
Cream of the Crop 22
/
Cream of the Crop 22.iso
/
comm
/
wsuck097.zip
/
WSUCK.HLP
(
.txt
)
< prev
Wrap
OS/2 Help File
|
1996-11-10
|
9KB
|
295 lines
ΓòÉΓòÉΓòÉ 1. WebSucker ΓòÉΓòÉΓòÉ
WebSucker 0.97
WebSucker can retrieve Web pages from a HTTP (WWW) server. It can be configured
to follow all hyperlinks on the page that lead to other pages on the same
server. Images on the pages can be retrieved as well. All pages are stored on
disk and can be viewed later using your web browser.
WebSucker can make use of a proxy HTTP server, speeding up the whole procedure.
WebSucker requires at least one HPFS partition!
Topics:
The main window
Common tasks
Command line options
For the techies
Contacting the author
ΓòÉΓòÉΓòÉ 1.1. The main window ΓòÉΓòÉΓòÉ
On the main window you find the following elements:
A drop down list where you enter the URL. The last 15 URLs are saved.
"Start", "Stop" and "Skip" buttons.
A log window. Its contents are also stored in the log file.
A status line. Its contents are:
- the current URL
- total number of data bytes retrieved
- total number of data bytes of the current URL
- number of bytes retrieved of the current URL
- number of URLs retrieved
- number of URLs tried
- number of URLs queued for inspection (estimated).
ΓòÉΓòÉΓòÉ 1.2. Common tasks ΓòÉΓòÉΓòÉ
Here's how to perform some common task with WebSucker:
I wan't to suck a complete web site.
In the setup, enable "Follow links", "Inline images". Disable "Don't climb up".
Then enter the root URL of the site (e.g. "http://www.thesite.com/"), then
press "Start".
I wan't to suck a subrange of a web site.
In the setup, enable "Follow links", "Inline images" and "Don't climb up". Then
enter the URL of the site (e.g. "http://www.thesite.com/some/path/start.html"),
then press "Start".
I wan't to suck a single web page with images, but only if it's changed.
In the setup, disable "Follow links". Enable "Inline images" and "Modified
pages only". Then enter the URL of the page (e.g.
"http://www.thesite.com/pageofinterest.html"), then press "Start".
ΓòÉΓòÉΓòÉ 1.3. Command line options ΓòÉΓòÉΓòÉ
WebSucker can be run in automated mode, i.e. it takes one or more URLs as
program parameters, downloads these pages according to the program options, and
exits when finished.
The command line syntax is:
WSUCK.EXE [<url> | @<listfile>]*
In other words, you can specify
one or more URLs, and
one or more list files. Each line in a list file is interpreted as URL.
Empty lines and lines starting with ';' are ignored.
When finished, WebSucker returns one of the following ERRORLEVEL values:
0 Everything OK
1 Invalid command line option
2 Problem(s) with one of the list files
10 Other error
ΓòÉΓòÉΓòÉ 1.4. For the techies ΓòÉΓòÉΓòÉ
Here's some technical information if you're interested:
WebSucker uses HTTP 1.0. HTTP 0.9 is not supported. If some web site is
still using a HTTP 0.9 server, its contents may be just as outdated, so
you might not miss anything.
WebSucker only follows HTTP links, not FTP or others.
WebSucker counts <IMG SRC=...> and <BODY BACKGROUND=...> as inline
images.
If the file name of a retrieved page isn't specified, it's stored as
INDEX.HTML.
The "Last-Modified" timestamp is stored in the file's EAs. The EA name is
HTTP.LMODIFIED and is of type EAT_ASCII.
Some characters in the URL are converted when building the path name of
the file. However, no conversion to FAT (8.3) names is performed!
If a page is redirected, the redirection is automatically followed, but
only if the new location is on the same server!
WebSucker has been developed on and tested with OS/2 Warp 4.0. It should
also work with the following configurations:
- Warp 3.0 with IAK
- Warp 3.0 with TCP/IP 2.0
- Warp 3.0 Connect (TCP/IP 3.0)
- Warp Server
ΓòÉΓòÉΓòÉ 1.5. Contacting the author ΓòÉΓòÉΓòÉ
WebSucker was developed by Michael Hohner. He can be reached electronically at:
EMail: miho@osn.de
Fidonet: 2:2490/2520.17
CompuServe: 100425,1754
ΓòÉΓòÉΓòÉ 2. File menu ΓòÉΓòÉΓòÉ
Exit
Ends the program.
ΓòÉΓòÉΓòÉ 3. Setup ΓòÉΓòÉΓòÉ
Options
Specify all program options.
ΓòÉΓòÉΓòÉ 3.1. Servers ΓòÉΓòÉΓòÉ
Proxy
Enter the host name of a proxy HTTP server. You may also specify a
port number for the proxy server. Check Enable to finally use the
server. Contact your service provider to get this data.
Note: Only enter the host name, not the URL (e.g. "proxy.isp.com",
not "http://proxy.isp.com:123/")!
Email address
Enter your EMail address. It is included in every request. Don't
enter anything here if you don't want your EMail address to be
revealed.
ΓòÉΓòÉΓòÉ 3.2. Paths ΓòÉΓòÉΓòÉ
Path for retrieved data
Path where retrieved pages and images are stored. This path and
subpaths are created automatically.
ΓòÉΓòÉΓòÉ 3.3. Logging ΓòÉΓòÉΓòÉ
These options control logging.
Log file
Path and name of the log file.
Additional information
Log additional (but somewhat optional) messages
Server replies
Log reply lines by the server
Debug messages
Log messages used for debugging purposes (turn on if requested).
ΓòÉΓòÉΓòÉ 3.4. Options ΓòÉΓòÉΓòÉ
These settings influence which items will be downloaded and how it'll be done.
Follow links
If checked, hyperlinks in retrieved pages are followed. Otherwise,
WebSucker retrieves one page only.
You can enter a set of extensions (separated by spaces, commas or
semicolons) of items to retrieve. Links to items with other
extensions are ignored. If you don't enter anything, all links are
followed.
Example: With "htm html", WebSucker only follows links to other HTML
pages, but does not download other hyperlinked files.
same servers
Only links to items on the same server are followed.
don't climb up
Hyperlinks to items that are hierarchically higher than the initial
URL are not followed. Otherwise, all links to items on the same
server are followed.
Example:
If you started with http://some.site/dir1/index.html, and the
current page is http://some.site/dir1/more/levels/abc.html, a link
that points to http://some.site/otherdir/index.html wouldn't be
followed, but a link to http://some.site/dir1/x/index.html would.
all
All links (even those to other servers) are followed. Be very
careful with this option!
Inline images
If checked, inline images are also retrieved.
from other servers
If checked, inline images located on other servers are also
retrieved. Otherwise only images from the same server are
downloaded.
Java applets
If checked, java applets are also retrieved.
from other servers
If checked, applets located on other servers are also retrieved.
Otherwise only applets from the same server are downloaded.
Retrieve modified items only
An item is only retrieved if it's newer than the local copy.
Strongly recommended!
Max link depth
Limits the depth of links to follow to the specified number. A level
of "1" specifies the initial page.
Example:
If page A contains a link to B, and B contains a link to C, A would
be level 1, B would be level 2 and C would be level 3. A maximum
link depth of "2" would retrieve pages A and B, but not C.
Max size
Limits the size of items to download. If the server announces the
size and it's larger than the number specified, the item is skipped.
If the server doesn't announce the size, the item is truncated when
the maximum size is reached.
ΓòÉΓòÉΓòÉ 3.5. Server list ΓòÉΓòÉΓòÉ
A list of base URLs is displayed.
Press New to add a new URL with settings.
Press Change to change the settings of the selected URL.
Press Delete to delete the selected URL.
ΓòÉΓòÉΓòÉ 3.6. Server ΓòÉΓòÉΓòÉ
Base URL
Set of URLs (this item and all items hierarchically below) for which
these settings apply. This usually specifies a directory on a
server.
Example:
If you enter "http://some.server/basedir", these settings apply to
"http://some.server/basedir/page1.html", but not to
"http://some.server/otherdir/b.html".
User name
User name or user ID used for basic authorization.
Password
Password used for basic authorization.
ΓòÉΓòÉΓòÉ 4. Help menu ΓòÉΓòÉΓòÉ
General help
Provides general help
Product information
Displays name, version number, copyright information etc.
ΓòÉΓòÉΓòÉ 5. About ΓòÉΓòÉΓòÉ
This page intentionally left blank.